The main purpose of this project was to determine the impact of applying Leading-Edge Technology in education on students’ academic behavior. To achieve this purpose kaggle public dataset was used. The analysis consists of two parts, exploratory analysis, and hypotheses testing. In the exploratory analysis part, the availability of missed data, the distribution, and the nature and possible scores of the variables in the dataset were explored using plots and other relevant functions in R. In the second part of the analysis, 4 different type/group of hypotheses were formulated and tested using the corresponding statistical methods. Accordingly, associations between students’ responsible parent and their satisfaction, parents’ satisfaction and students’ final grade, and responsible parent and students’ final grade were examined using chi-squired test. In addition, independent sample t-test was employed to examine the difference in hand raising in male and female students. The other points addressed in this part of analysis was the correlation between announcement view and participation in discussion. The final part of the analysis dealt with multiple linear regression which was used to examine the prediction of students’ hand raising by announcement view and participation in discussion. The result of the exploratory data analysis part indicated that there is no missing value in the dataset and all the quantitative variables were not normally distributed. On the other hand the results of hypothesis testing depicted that there is a statistically significant association between students’ responsible parent and their satisfaction, parents’ satisfaction and students’ final grade, and responsible parent and students’ final grade. The t-test anaysis indicated that there was a statistically significant difference in hand raising behavior between female and male students [t(478)=3.3165, p=0.0001]. The correlation test indicated that there is a statistically significant correlation between participation in discussion and announcement view [r (478)= 0.285, p <.0001]. Morever, a significant regression equation was found (F (2, 477) = 306.12, p< .0001), in predicting students’ hand raising based on visiting resources and viewing announcement. Finally, based on the results it was concluded that applying leading-edge technology in education can have positive contribution in improving students academic behavior.
The dataset (“Students’ Academic Performance Dataset”) used in this project was accessed from kaggle public dataset (https://www.kaggle.com/aljarah/xAPI-Edu-Data/data). It was found in csv format (xAPI-Edu-Data.csv) with its codebook. It consists of 480 participants and 16 variables. As clearly mentioned in the introduction part of the codebook the variables in the dataset are classified into three major categories: (1) Demographic features such as gender and nationality; (2) Academic background features such as educational stage, grade Level and section; (3) Behavioral features such as raised hand on class, opening resources, answering survey by parents, and parents’ school satisfaction. The dataset has no missing value from the very beginning. Generally, the dataset consists of an educational information which is collected from learning management system (LMS), that has been designed to facilitate learning through applying leading-edge technology in education. The system provides users with a synchronous access to educational resources from any device with Internet connection.The data is collected using a learner activity tracker tool, which is a component of the training and learning architecture that enables to monitor students learning progress and their actions like reading an article or watching a training video. More details about the dataset including its codebook is variable at https://www.kaggle.com/aljarah/xAPI-Edu-Data
Load the important packages
library(tidyverse)
library(broom)
library(psych)
library(stargazer)
library(car)
library(ggfortify)
Import the dataset
given_dataset<- read_csv(file="/Users/biruk/Dropbox/Final Project in R/xAPI-Edu-Data.csv")
Study the variables in the dataset
Explore the nature of the variables in the dataset using summary function
summary(given_dataset)
## gender NationalITy PlaceofBirth
## Length:480 Length:480 Length:480
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## StageID GradeID SectionID
## Length:480 Length:480 Length:480
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## Topic Semester Relation raisedhands
## Length:480 Length:480 Length:480 Min. : 0.00
## Class :character Class :character Class :character 1st Qu.: 15.75
## Mode :character Mode :character Mode :character Median : 50.00
## Mean : 46.77
## 3rd Qu.: 75.00
## Max. :100.00
## VisITedResources AnnouncementsView Discussion ParentAnsweringSurvey
## Min. : 0.0 Min. : 0.00 Min. : 1.00 Length:480
## 1st Qu.:20.0 1st Qu.:14.00 1st Qu.:20.00 Class :character
## Median :65.0 Median :33.00 Median :39.00 Mode :character
## Mean :54.8 Mean :37.92 Mean :43.28
## 3rd Qu.:84.0 3rd Qu.:58.00 3rd Qu.:70.00
## Max. :99.0 Max. :98.00 Max. :99.00
## ParentschoolSatisfaction StudentAbsenceDays Class
## Length:480 Length:480 Length:480
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
Study missing values in the dataset
sapply(given_dataset,function(x) sum(is.na(x)))
## gender NationalITy PlaceofBirth
## 0 0 0
## StageID GradeID SectionID
## 0 0 0
## Topic Semester Relation
## 0 0 0
## raisedhands VisITedResources AnnouncementsView
## 0 0 0
## Discussion ParentAnsweringSurvey ParentschoolSatisfaction
## 0 0 0
## StudentAbsenceDays Class
## 0 0
No variable in the dataset has missing value
Exclude variables not to be used in this project
performance_data<-given_dataset %>%
select(-PlaceofBirth, -StageID, -GradeID, -SectionID, -Semester, -ParentAnsweringSurvey, -StudentAbsenceDays)
Explore the possible scores for non-continous variables
performance_data %>% distinct(gender)
## # A tibble: 2 x 1
## gender
## <chr>
## 1 M
## 2 F
performance_data %>% distinct(Topic)
## # A tibble: 12 x 1
## Topic
## <chr>
## 1 IT
## 2 Math
## 3 Arabic
## 4 Science
## 5 English
## 6 Quran
## 7 Spanish
## 8 French
## 9 History
## 10 Biology
## 11 Chemistry
## 12 Geology
performance_data %>% distinct(Relation)
## # A tibble: 2 x 1
## Relation
## <chr>
## 1 Father
## 2 Mum
performance_data %>% distinct(ParentschoolSatisfaction)
## # A tibble: 2 x 1
## ParentschoolSatisfaction
## <chr>
## 1 Good
## 2 Bad
performance_data %>% distinct(Class)
## # A tibble: 3 x 1
## Class
## <chr>
## 1 M
## 2 L
## 3 H
performance_data %>% distinct(NationalITy)
## # A tibble: 14 x 1
## NationalITy
## <chr>
## 1 KW
## 2 lebanon
## 3 Egypt
## 4 SaudiArabia
## 5 USA
## 6 Jordan
## 7 venzuela
## 8 Iran
## 9 Tunis
## 10 Morocco
## 11 Syria
## 12 Palestine
## 13 Iraq
## 14 Lybia
Inform the nature of the qualitative variables to R
subject <-factor(performance_data$Topic, levels = c("English", "Spanish", "French", "Arabic", "IT", "Math", "Chemistry", "Biology", "Science", "History", "Quran", "Geology"))
resp_parent <- factor(performance_data$Relation, levels = c("Mum", "Father"))
Parent_Sats <- ordered(performance_data$ParentschoolSatisfaction, levels = c("Bad", "Good"))
final_grade <- ordered(performance_data$Class, levels = c("L", "M", "H"))
sex <- factor(performance_data$gender, levels = c("F", "M"))
citizenship <- factor(performance_data$NationalITy,levels = c("KW", "lebanon", "Egypt", "SaudiArabia", "USA", "Jordan", "venzuela", "Iran", "Tunis", "Morocco", "Syria", "Palestine", "Iraq", "Lybia"))
Transformed the dataset so that string variables will be changed to numeric form
stud_performance_data<-performance_data %>%
mutate (subject = as.integer(factor(Topic, levels = c("English", "Spanish", "French", "Arabic", "IT", "Math", "Chemistry", "Biology", "Science", "History", "Quran", "Geology"))),
resp_parent = as.integer(factor(Relation, levels = c("Mum", "Father"))),
Parent_Sats = as.integer(factor(ParentschoolSatisfaction, levels = c("Bad", "Good"))),
final_grade = as.integer(factor(Class, levels = c("L", "M", "H"))),
citizenship = as.integer(factor(NationalITy, levels = c("KW", "lebanon", "Egypt", "SaudiArabia", "USA", "Jordan", "venzuela", "Iran", "Tunis", "Morocco", "Syria", "Palestine", "Iraq", "Lybia"))),
sex = as.integer(factor(gender, levels = c("F", "M")))) %>%
as_tibble()
Study outliers
Explore outliers of the continous variables using boxplots and boxplot statistics functions
boxplot(stud_performance_data$Discussion, stud_performance_data$raisedhands, stud_performance_data$VisITedResources, stud_performance_data$AnnouncementsView, horizontal=TRUE)
As can be clearly seen from the boxplots above there is no datapoint under lower limit and above upper limit in all of the four box plots. So, there is no outliers in all of the four variables.
Besides the boxplots, boxplot statistics can also be used to check outliers. For instance, the boxplot.stats function confirmed that there is no outliers in “discussion”
boxplot.stats(stud_performance_data$Discussion)
## $stats
## [1] 1 20 39 70 99
##
## $n
## [1] 480
##
## $conf
## [1] 35.39416 42.60584
##
## $out
## integer(0)
Check for normality of quantitative variables in the dataset
plot (density(stud_performance_data$raisedhands))
plot(density(stud_performance_data$Discussion))
hist(stud_performance_data$VisITedResources)
hist(stud_performance_data$AnnouncementsView)
As can be seen from the plots above, the distribution of variables “raisedhands”, “Discussion”, “VisITedResources” and “AnnouncementsView” are not normal.
Visualization of the variables in the dataset
The composition of student participants in terms of the Subjects/courses can be indicated using pie plot.
pie(table(stud_performance_data$Topic))
The composition of student participants in terms of their citizenship can be communicated using bar plot
barplot(table(stud_performance_data$NationalITy), xlab = 'participants citizenship', ylab = 'Number Of Participants')
Is there association between students’ responsible parent and their satisfaction?
table(resp_parent, Parent_Sats)
## Parent_Sats
## resp_parent Bad Good
## Mum 44 153
## Father 144 139
chisq.test(table(resp_parent, Parent_Sats))
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: table(resp_parent, Parent_Sats)
## X-squared = 38.541, df = 1, p-value = 5.363e-10
The chi-squared test result indicated that there is a satatistically significant association between students’ responsible parent for students education and their satisfaction (??2 (1, N = 480) = 38.541, p < .0001).
plot the result
plot(resp_parent, Parent_Sats, ylab = "Parents' satisfaction", xlab = "Responsible parent for students education")
The plot communicates that larger proportion of participants in mothers group have good satisfaction than fathers group
Is there association between parents’ satisfaction and their children’s final grade score?
table(Parent_Sats, final_grade)
## final_grade
## Parent_Sats L M H
## Bad 84 80 24
## Good 43 131 118
chisq.test(table(Parent_Sats, final_grade))
##
## Pearson's Chi-squared test
##
## data: table(Parent_Sats, final_grade)
## X-squared = 68.47, df = 2, p-value = 1.355e-15
Chi-squire test depicted that there was a statistically significant association between patents’ satisfaction and their children’s final grade score (??2 (2, N = 480) = 68.47, p < .0001).
Now show the result in plot
plot(Parent_Sats, final_grade, ylab = "Student's final grade score", xlab = "Parents satisfaction")
The plot showed that larger portion of students from parents who have good satisfaction scored better grade than students whose parents have bad satisfaction.
Is there difference in studetns’ hand raising as a function of their sex?
Examine the assumptions for parametric test to decide the statistical test to be used. In this regard, independence of observations is automatic as the two observations are completely independent (male and female).
Do box plot to examine the equivalence of variance between the two groups
boxplot(stud_performance_data$raisedhands~sex)
The boxplot indicates that the variances of the two groups are almost the same. This can also be verified usng Levene’s Test
leveneTest(stud_performance_data$raisedhands~sex)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 0.9706 0.325
## 478
Levene’s Test confirmed that the difference of the variance of the two groups is not significant.
Now conduct t-test to verify/refute the hypothesis
t.test(stud_performance_data$raisedhands~stud_performance_data$sex, mu=0, alt="two.sided", conf=0.95, var.eq=T, paired=F)
##
## Two Sample t-test
##
## data: stud_performance_data$raisedhands by stud_performance_data$sex
## t = 3.3165, df = 478, p-value = 0.0009809
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 3.904502 15.257278
## sample estimates:
## mean in group 1 mean in group 2
## 52.86286 43.28197
To include SD statistics in the report calculate it
sd(stud_performance_data$raisedhands[sex=="F"])
## [1] 30.21805
sd(stud_performance_data$raisedhands[sex=="M"])
## [1] 30.60217
So, an independent-samples t-test was conducted to compare students hand raising as a function of their sex. The result indicated that there was a significant difference in hand raising if male (M=43.282, SD=30.602) and female (F=52.863, SD=30.218) students [t(478)=3.3165, p=0.0001].
Is there a statistical significant relationship between announcement view and participation in discussion?
In the exploratory analysis part it was confirmed that the distribution of both announcement view and discussion were not normal.
So, Check the the transform of the scores if it helps for normality
log_Discussion<-log(stud_performance_data$Discussion)
qqnorm(log_Discussion, col='blue')
qqline(log_Discussion, col ="red")
log_AnnouncementsView<-log(stud_performance_data$AnnouncementsView + 1)
qqnorm(log_AnnouncementsView, col='blue')
qqline(log_AnnouncementsView, col ="red")
Still the transformation of the scores in both variables didn’t help them for normality
Plot to examine monotonic relationship between the two variables to use Kendall’s tau.
plot(stud_performance_data$Discussion, stud_performance_data$AnnouncementsView, main="scatterplot", las=1)
The plot shows that it fulfilled the assumption of monotonic relationship
Now conduct correlation test
cor.test(stud_performance_data$Discussion, stud_performance_data$AnnouncementsView, method="kendall")
##
## Kendall's rank correlation tau
##
## data: stud_performance_data$Discussion and stud_performance_data$AnnouncementsView
## z = 9.1773, p-value < 2.2e-16
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau
## 0.2851579
plot the result by using different color for gender just for the sake of enhancing informativity of the plot
stud_performance_data %>%
ggplot() +
aes(x = AnnouncementsView, y = Discussion) +
geom_point(aes(color = gender), size = 3) +
geom_smooth(method = "lm")
A Kendall’s tau coefficient was computed to assess the relationship between the announcement view and students’ participation in discussion. The result depicted that there is a significant correlation between the two variables [r (478)= 0.285, p <.0001].
The relationship among the variables used in the hypothesis was studied using plot matrix (used to check the linearity of relationship between IVs with DV)
plot(stud_performance_data[5:7], pch=16, col="blue", main="Matrix Scatterplot of raisedhands, VisITedResources, AnnouncementsView")
As can be seen from the plot matrix the assumption of linear relationship between IVS and DV is fulfilled.
The next step could be to check for multicollinearity of independent variables. It would be better to start this process by looking at correlation matrix among the variables
library(corrplot)
check_cor = cor(stud_performance_data[5:7])
corrplot(check_cor, method = "number")
As can be seen from the correlation matrix the correlation between the two independent variables is 0.59, which is below the cutoff (.8)
Fit a Linear model and continue testing its assumptions
lm1<-lm(raisedhands~VisITedResources + AnnouncementsView, data=stud_performance_data)
summary(lm1)
##
## Call:
## lm(formula = raisedhands ~ VisITedResources + AnnouncementsView,
## data = stud_performance_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -57.254 -11.179 0.393 12.918 62.537
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.63719 1.87488 3.540 0.000439 ***
## VisITedResources 0.44433 0.03506 12.673 < 2e-16 ***
## AnnouncementsView 0.41641 0.04358 9.554 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.41 on 477 degrees of freedom
## Multiple R-squared: 0.5621, Adjusted R-squared: 0.5602
## F-statistic: 306.1 on 2 and 477 DF, p-value: < 2.2e-16
Now assess multicollinearity using the variance inflaction factor (VIF)
car::vif(lm1)
## VisITedResources AnnouncementsView
## 1.546624 1.546624
The values look ok as they are not very large. However as the average of VIF is larger than 1, it seems that multicollineratity is a bit biasing the model.
mean(car::vif(lm1))
## [1] 1.546624
So, look at the tolerance.
1/car::vif(lm1)
## VisITedResources AnnouncementsView
## 0.6465697 0.6465697
The tolerance value seems it is good as it is in the range of 0 and 1.
Now, assess the independence of residuals
car::dwt(lm1)
## lag Autocorrelation D-W Statistic p-value
## 1 0.2779508 1.443303 0
## Alternative hypothesis: rho != 0
The result indicates that there seems autocorrelation between the the two independent variables, so the residuals may not be independent.
To check heteroscetasticity inspect the residual diagnostic plots.
autoplot(lm1, which = 1:6, label.size = 3)
From the plot it is possible to observe that residuals are randomly distributed around regression line. Besides, from the Q-Q plot, it seems that residuals follow normal distribution. So, residuals in the this model have passed the test of Normality.
The scale-location plot is a bit bent from the horizonal ideal line and but still it would approximatley indicate that residuals have uniform variance across the range.
Examine the outliers if there is concrete reason to eliminate theme. There is no concrete reason to eliminate them.
stud_performance_data %>%
slice(c(96, 178, 187, 345, 382))
## # A tibble: 5 x 16
## gender NationalITy Topic Relation raisedhands VisITedResources
## <chr> <chr> <chr> <chr> <int> <int>
## 1 F KW IT Father 100 80
## 2 F USA French Mum 15 52
## 3 M KW Arabic Mum 85 15
## 4 F Jordan French Mum 14 97
## 5 F Jordan Arabic Father 10 12
## # ... with 10 more variables: AnnouncementsView <int>, Discussion <int>,
## # ParentschoolSatisfaction <chr>, Class <chr>, subject <int>,
## # resp_parent <int>, Parent_Sats <int>, final_grade <int>,
## # citizenship <int>, sex <int>
Fit the second model which consists of both predictors and their interaction together.
lm2<-lm(raisedhands~VisITedResources*AnnouncementsView, data=stud_performance_data)
summary(lm2)
##
## Call:
## lm(formula = raisedhands ~ VisITedResources * AnnouncementsView,
## data = stud_performance_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -57.386 -11.488 -0.135 12.794 62.720
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.741111 2.690976 2.877 0.00420
## VisITedResources 0.423887 0.050066 8.467 3.16e-16
## AnnouncementsView 0.359233 0.109016 3.295 0.00106
## VisITedResources:AnnouncementsView 0.000840 0.001468 0.572 0.56741
##
## (Intercept) **
## VisITedResources ***
## AnnouncementsView **
## VisITedResources:AnnouncementsView
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.43 on 476 degrees of freedom
## Multiple R-squared: 0.5624, Adjusted R-squared: 0.5596
## F-statistic: 203.9 on 3 and 476 DF, p-value: < 2.2e-16
The p value indicates that the interaction between the two independent variables doesn’t significantly contribute to the model. Besides, in the model, compared with model 1 above, the F-Statistic fall down from 306.1 to 203.9. However, no substantial change was seen in residual standard error and adjusted R-square value.
The VIF of the model indicated that the interaction between the two variables has a score of 15.007, which is larger than 10. So, multicollinearity is an issue here.
car::vif(lm2)
## VisITedResources AnnouncementsView
## 3.149303 9.662806
## VisITedResources:AnnouncementsView
## 15.007122
In addition, the the average of VIF score (9.273) confirms that multicollineratity is biasing the model.
mean(car::vif(lm2))
## [1] 9.273077
The tolerance also indicates that except visiting resources the other two variables (announcement and interaction between announcement view and visiting resource) have not tolerable.
1/car::vif(lm2)
## VisITedResources AnnouncementsView
## 0.31753057 0.10348960
## VisITedResources:AnnouncementsView
## 0.06663503
Measure the independence of residuals
car::dwt(lm2)
## lag Autocorrelation D-W Statistic p-value
## 1 0.2785941 1.441987 0
## Alternative hypothesis: rho != 0
It seems that the model has some significant autocorrelation, so the residuals are not independent. To check heteroscetasticity inspect the residual diagnostic plots.
autoplot(lm2, which = 1:6, label.size = 3)
Fit the third model which includes only the interaction between the two predictors and continue testing linear regression model assumptions
lm3<-lm(raisedhands~VisITedResources:AnnouncementsView, data=stud_performance_data)
summary(lm3)
##
## Call:
## lm(formula = raisedhands ~ VisITedResources:AnnouncementsView,
## data = stud_performance_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -61.828 -16.113 -3.039 13.605 74.693
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.390e+01 1.453e+00 16.45 <2e-16
## VisITedResources:AnnouncementsView 8.798e-03 4.059e-04 21.67 <2e-16
##
## (Intercept) ***
## VisITedResources:AnnouncementsView ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 21.88 on 478 degrees of freedom
## Multiple R-squared: 0.4956, Adjusted R-squared: 0.4946
## F-statistic: 469.7 on 1 and 478 DF, p-value: < 2.2e-16
In this model the p value indicates that the interaticon of the independent variables significantly contribute to the model.
Assess the independence of residuals in the model.
car::dwt(lm3)
## lag Autocorrelation D-W Statistic p-value
## 1 0.2484915 1.50235 0
## Alternative hypothesis: rho != 0
The result indicates that there seems autocorrelation between the the two independent variables, so the residuals may not be independent again.
To check heteroscetasticity inspect the residual diagnostic plots of the model.
autoplot(lm3, which = 1:6, label.size = 3)
Compared to the first model, in this model adjusted R-squared value gets down while residual standard error almost stayed the same. On the other hand, the F-statistic get improved from 306.1 to 469.7 almost for the same degree of freedom.
Now, let’s fix the confidence intervals for parameters.
confint(lm1, level = 0.95)
## 2.5 % 97.5 %
## (Intercept) 2.9531474 10.3212263
## VisITedResources 0.3754325 0.5132187
## AnnouncementsView 0.3307687 0.5020486
confint(lm2, level = 0.95)
## 2.5 % 97.5 %
## (Intercept) 2.453449404 13.028772138
## VisITedResources 0.325508381 0.522265018
## AnnouncementsView 0.145020815 0.573445233
## VisITedResources:AnnouncementsView -0.002044292 0.003724304
confint(lm3, level = 0.95)
## 2.5 % 97.5 %
## (Intercept) 21.044274119 26.754756871
## VisITedResources:AnnouncementsView 0.008000253 0.009595484
Model Comparison
compare the models using broom::glance()
glance(lm1)
## r.squared adj.r.squared sigma statistic p.value df logLik
## 1 0.5620764 0.5602403 20.41105 306.1156 2.975109e-86 3 -2127.303
## AIC BIC deviance df.residual
## 1 4262.605 4279.3 198723.5 477
glance(lm2)
## r.squared adj.r.squared sigma statistic p.value df logLik
## 1 0.5623775 0.5596194 20.42546 203.8985 4.998754e-85 4 -2127.137
## AIC BIC deviance df.residual
## 1 4264.275 4285.144 198586.8 476
glance(lm3)
## r.squared adj.r.squared sigma statistic p.value df logLik
## 1 0.4956469 0.4945917 21.88159 469.7486 4.646176e-73 2 -2161.198
## AIC BIC deviance df.residual
## 1 4328.396 4340.918 228868.2 478
Based on the principle to choose a model with the smallest logLik, AIC, and BIC with the same df, the first model (lm1) is selected.
To confirm model selection annova function can be used
anova(lm1, lm2)
## Analysis of Variance Table
##
## Model 1: raisedhands ~ VisITedResources + AnnouncementsView
## Model 2: raisedhands ~ VisITedResources * AnnouncementsView
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 477 198723
## 2 476 198587 1 136.63 0.3275 0.5674
anova(lm1, lm3)
## Analysis of Variance Table
##
## Model 1: raisedhands ~ VisITedResources + AnnouncementsView
## Model 2: raisedhands ~ VisITedResources:AnnouncementsView
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 477 198723
## 2 478 228868 -1 -30145 72.357 2.334e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Anova test for model comparision showed that there is no significant difference between the first and the second models(F=0.5674, df=1).
F value indicated that there is significant difference between the first and the third model in favor of the first. Threfore, anova analysis aslo comfirmed that the first model is the winner.
Normal distribution of resuduals in the winner model can be confirmed again as follows
stud_performance_data %>%
augment(lm(raisedhands~VisITedResources + AnnouncementsView, data = .), .) %>%
ggplot() +
aes(.resid) +
geom_histogram(bins = 10)
The residuals of the selected model are approximatelly normally distributed!
This can be also confirmed using statistical method
stud_performance_data %>%
augment(lm(raisedhands ~ VisITedResources*AnnouncementsView, data = .), .) %>%
pull(.resid) %>%
shapiro.test(.)
##
## Shapiro-Wilk normality test
##
## data: .
## W = 0.99443, p-value = 0.07819
The Shapiro-Wilks test also confirmed that the residuals are normally distributed (p=0.07819). Theforfore, the model is valid.
Explore the jointed effect of visiting resource and announcement vewing on handraising using coplot function.
coplot(raisedhands~VisITedResources|AnnouncementsView, panel = panel.smooth, stud_performance_data)
By loading important packages plot the result of the model in 3D.
library(plotly)
plot_ly(stud_performance_data, y= ~stud_performance_data$raisedhands, x= ~stud_performance_data$AnnouncementsView, z= ~stud_performance_data$VisITedResources)
## No trace type specified:
## Based on info supplied, a 'scatter3d' trace seems appropriate.
## Read more about this trace type -> https://plot.ly/r/reference/#scatter3d
## No scatter3d mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
Now, report the results of multiple regression anaysis in table form.
library(stargazer)
library(knitr)
stargazer(lm1,
lm2,
lm3,
coef = list(lm1$standardized.coefficients,
lm2$standardized.coefficients,
lm3$standardized.coefficients),
title = "Multiple regression analysis result",
dep.var.labels = "Hand Raising",
align = TRUE,
ci = TRUE,
df = TRUE,
digits = 2,
type = "html")
| Dependent variable: | |||
| Hand Raising | |||
| (1) | (2) | (3) | |
| VisITedResources | 0.44*** | 0.42*** | |
| (0.38, 0.51) | (0.33, 0.52) | ||
| AnnouncementsView | 0.42*** | 0.36*** | |
| (0.33, 0.50) | (0.15, 0.57) | ||
| VisITedResources:AnnouncementsView | 0.001 | 0.01*** | |
| (-0.002, 0.004) | (0.01, 0.01) | ||
| Constant | 6.64*** | 7.74*** | 23.90*** |
| (2.96, 10.31) | (2.47, 13.02) | (21.05, 26.75) | |
| Observations | 480 | 480 | 480 |
| R2 | 0.56 | 0.56 | 0.50 |
| Adjusted R2 | 0.56 | 0.56 | 0.49 |
| Residual Std. Error | 20.41 (df = 477) | 20.43 (df = 476) | 21.88 (df = 478) |
| F Statistic | 306.12*** (df = 2; 477) | 203.90*** (df = 3; 476) | 469.75*** (df = 1; 478) |
| Note: | p<0.1; p<0.05; p<0.01 | ||
To indicate the result in standerdized scale transform the non-standardized values using the lm.beta package
library(lm.beta)
Create standardized versions from all objects
lm1_std <- lm.beta(lm1)
lm2_std <- lm.beta(lm2)
lm3_std <- lm.beta(lm3)
explicitly tell stargazer which coefficients we want to see
stargazer(lm1_std,
lm2_std,
lm3_std,
coef = list(lm1_std$standardized.coefficients,
lm2_std$standardized.coefficients,
lm3_std$standardized.coefficients),
title = "Result of multiple regression analysis(standerdized)",
dep.var.labels = "Raising Hand",
align = TRUE,
ci = TRUE,
df = TRUE,
digits = 2,
type = "html")
| Dependent variable: | |||
| Raising Hand | |||
| (1) | (2) | (3) | |
| VisITedResources | 0.48*** | 0.46*** | |
| (0.41, 0.55) | (0.36, 0.55) | ||
| AnnouncementsView | 0.36*** | 0.31*** | |
| (0.27, 0.45) | (0.10, 0.52) | ||
| VisITedResources:AnnouncementsView | 0.07*** | 0.70*** | |
| (0.06, 0.07) | (0.70, 0.70) | ||
| Constant | 0.00 | 0.00 | 0.00 |
| (-3.67, 3.67) | (-5.27, 5.27) | (-2.85, 2.85) | |
| Observations | 480 | 480 | 480 |
| R2 | 0.56 | 0.56 | 0.50 |
| Adjusted R2 | 0.56 | 0.56 | 0.49 |
| Residual Std. Error | 20.41 (df = 477) | 20.43 (df = 476) | 21.88 (df = 478) |
| F Statistic | 306.12*** (df = 2; 477) | 203.90*** (df = 3; 476) | 469.75*** (df = 1; 478) |
| Note: | p<0.1; p<0.05; p<0.01 | ||
A multiple linear regression was calculated to predict hand raising based on visiting resources and viewing announcement. A significant regression equation was found (F (2, 477) = 306.12, p< .0001), with an R2 =0.56. Participants’ predicted hand raising is equal to 6.64 + 0.44 (visiting resource) + 0.42 (viewing announcement). Both resource visit and announcement view were significant predictors of hand raising frequency.
The main goal of this project was to examine the effect of applying Leading-Edge Technology in Education on students academic behavior. To achieve this goal different research questions were formulated and relevant statistical tests were employed. Base on the results it was concluded that:
1.Students academic performance gets better when the responsible parent for students’ education is mother, and when parents get satisfied with their children’s education.
2.Students hand raising behavior is associated with their sex. Female students are better in hand raising than their counterparts.
3.There is an association between announcement view and participation in discussion.
4.Visiting resources and viewing announcements are significant predictors of hand raising.